Support VLM processors in `is_chat_template_prefix_preserving` by qgallouedec · Pull Request #5558 · huggingface/trl

qgallouedec · 2026-04-15T16:03:51Z

Extend is_chat_template_prefix_preserving to accept VLM processors (not only tokenizers), and update its type hint accordingly.

Context: https://github.com/huggingface/trl/pull/5489/changes#r3087655676

Why

For VLMs, processor.apply_chat_template is not just an alias for processor.tokenizer.apply_chat_template. Checking prefix-preservation on the inner tokenizer can therefore diverge from what actually happens at training time. We want to call the check on the processor whenever one is available.

Note

Low Risk
Small, well-scoped change to a utility check and its tests; main risk is introducing an extra PIL/image dependency path when running the prefix-preservation check for processors.

Overview
is_chat_template_prefix_preserving now accepts either a PreTrainedTokenizer or a VLM ProcessorMixin and runs the prefix check via processing_class.apply_chat_template.

For processors, the check now builds multimodal messages (including a dummy image) via prepare_multimodal_messages so image-token expansion is exercised, and tests add a new require_vision case validating prefix-preservation on a processor template.

^{Reviewed by Cursor Bugbot for commit fee8f7e. Bugbot is set up for automated code reviews on this repo. Configure here.}

HuggingFaceDocBuilderDev · 2026-04-15T16:06:35Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 355700a58c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

albertvillanova

Thanks.

Support VLM processors in is_chat_template_prefix_preserving

355700a

qgallouedec requested review from AmineDiro, albertvillanova and kashif April 15, 2026 16:03

chatgpt-codex-connector Bot reviewed Apr 15, 2026

View reviewed changes

Comment thread tests/test_chat_template_utils.py

This was referenced Apr 15, 2026

Check prefix preservation at the token level #5559

Merged

Set _tokenizer as trainer attribute #5489

Merged

qgallouedec and others added 2 commits April 15, 2026 17:49

requires vision

b08cc7c

Merge branch 'main' into vlm-is-chat-template-prefix-preserving

fee8f7e

albertvillanova approved these changes Apr 17, 2026

View reviewed changes

qgallouedec merged commit 4595347 into main Apr 17, 2026
12 of 13 checks passed

qgallouedec deleted the vlm-is-chat-template-prefix-preserving branch April 17, 2026 13:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support VLM processors in `is_chat_template_prefix_preserving`#5558

Support VLM processors in `is_chat_template_prefix_preserving`#5558
qgallouedec merged 3 commits into
mainfrom
vlm-is-chat-template-prefix-preserving

qgallouedec commented Apr 15, 2026 •

edited by cursor Bot

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Apr 15, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

albertvillanova left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

qgallouedec commented Apr 15, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

Uh oh!

HuggingFaceDocBuilderDev commented Apr 15, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

albertvillanova left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

qgallouedec commented Apr 15, 2026 •

edited by cursor Bot

Loading